Search CORE

8 research outputs found

Rewards and errors in multi-arm bandits for interactive education

Author: Brunskill Emma
Erraqabi Akram
Lazaric Alessandro
Liu Yun-En
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienceIn multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward

INRIA a CCSD electronic archive server

Trading off rewards and errors in multi-armed bandits

Author: Brunskill Emma
Erraqabi Akram
Lazaric Alessandro
Liu Yun-En
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Pliable rejection sampling

Author: Carpentier Alexandra
Erraqabi Akram
Maillard Odalric-Ambrym
Valko Michal
Publication venue: HAL CCSD
Publication date: 19/06/2016
Field of study

International audienceRejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Trading off rewards and errors in multi-armed bandits

Author: Brunskill Emma
Erraqabi Akram
Lazaric Alessandro
Liu Yun-En
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

Hal-Diderot

Rewards and errors in multi-arm bandits for interactive education

Author: Brunskill Emma
Erraqabi Akram
Lazaric Alessandro
Liu Yun-En
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Pliable Rejection Sampling Odalric-Ambrym Maillard

Author: Akram Er-Raqabi@umontreal
Akram Erraqabi
Alexandra Carpentier
Ca
Fr
Michal Valko
Michal Valko@inria
Odalric Maillard@inria Fr
Publication venue
Publication date: 01/05/2020
Field of study

Abstract Rejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f . Moreover, PRS comes with a guarantee on the number of accepted samples

CiteSeerX